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CN ■ Abstract 
^ ■ 

' We consider low-rank reconstruction of a matrix using a subset of its columns and we present 

O^l ■ asymptotically optimal algorithms for both spectral norm and Frobenius norm reconstruction. 

I The main tools we introduce to obtain our results are: (i) the use of fast approximate SVD-like 

QJ i decompositions for column-based matrix reconstruction, and (ii) two deterministic algorithms for 
selecting rows from matrices with orthonormal columns, building upon the sparse representation 

If^ i theorem for decompositions of the identity that appeared in [1] . 
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1 Introduction 



The best rank k approximation to a matrix A E ]^™-x" = Yli=i ^i^i'^f ^ where o"i > a"2 > • • • > 
(Tfc > are the top k singular values of A, with associated left and right singular vectors Uj E 
and Vj E M" respectively. (See Section 1.1 for notation.) The singular values and singular vectors of 
' A can be computed via the Singular Value Decomposition (SVD) of A in 0{mnm.m{m,n}) time. 

■ There is considerable interest (e.g. [4, 6, 8, 9, 11, 15, 19, 20, 21]) in determining a minimum set of 

r <^ n columns of A which is approximately as good as A^. at reconstructing A. Such columns are 
Q"^ ' important for interpretting data [21], building robust machine learning algorithms [4], etc. 

O ■ Let A E M"^x"- and let C E M™'^'' consist of r columns of A for some k < r < n. We are 

. interested in the reconstruction errors 

o 

||A-CC+A||^ and || A - n^_;;^(A)||^, 

for ^ = 2,F (see Section 1.1 for notation). The former is the reconstruction error for A using 
^ , the columns in C; the latter is the error from the best rank k reconstruction of A (under the 

I appropriate norm) within the column space of C. For fixed A, k, and r, we would like these errors 

to be as close to 

l|A- Afcllg 

as possible. We present polynomial-time near-optimal constructions for arbitrary r > k, settling 
important open questions regarding column-based matrix reconstruction. 

• Spectral norm: What is the best reconstruction error with r > k columns? We present 
polynomial-time (deterministic and randomized) algorithms with approximation error asymp- 
totically matching a lower bound proven in this work. Prior work had focused on the r = k 
case and presented near-optimal polynomial-time algorithms [6, 17]. 

• Frobenius norm: How many columns are needed for relative error approximation, i.e. a 
reconstruction error of (1 -|- e)||A — Afc||p, for e > 0? We show that 0{k/e) columns contain 
a rank-/c subspace which reconstructs A to relative error, and we present the first sub-SVD 
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(in terms of running time) randomized algorithm to identify these columns. This matclics 
the Q^k/e) lower bound in [8] and improves the best known upper bound of 0{klogk + 
k/e) [6, 8, 12, 23]. 

1.1 Notation 

A, B, . . . are matrices; a, b, . . . are column vectors. I„ is the n x n identity matrix; Omxn is the mxn 
matrix of zeros; 1„ is the n x 1 vector of ones; Cj is the standard basis (whose dimensionality will be 

clear from the context); rank(A) is the rank of A. The Frobenius and the spectral matrix-norms 
are: ||A||| = Afj and ||A||2 = max||x||2=i ||Ax||2; ||A||g is used if a result holds for both norms 
^ = 2 and ^ = F. The Singular Value Decomposition (SVD) of A, with rank(A) = p is 

with singular values ai > . . . Ufe > a^+i > . . . > ap > 0. We will use ai (A) to denote the i-th 
singular value of A when the matrix is not clear from the context. The matrices Ujt G M"*^'^ and 
Up_fc € ]^»"x(p-'=) contain the left singular vectors of A, and, similarly, the matrices G M"-^'^ 
and Vp_fc G MJ^^iP-^) contain the right singular vectors of A. It is well-known that A^ = UfeSfcV^ 
minimizes || A — X||^ over all matrices X G ]^™x" of rank at most k. We use Ap_k to denote the 
matrix A — A^ = Up_jfcSp_jfcVj_j.. Also, A"^ = VaS^'^U^ denotes the Moore-Penrose pseudo- 
inverse of A. For a symmetric positive definite matrix A = BB"^, Aj (A) = af (B) denotes the i-th 
eigenvalue of A. 

Finally, given a matrix A G ]R™-xn ^ matrix C G MV^^'^' with r > k, we formally define the 
matrix H^^ ^(A) G R"*^" as the best approximation to A within the column space of C that has 
rank at most k. ^(A) minimizes the residual || A — A||^, over all A in the column space of C 
that have rank at most k (one can write II^^ jk(-A-) = CX where X G W^'" has rank at most k). In 
general, Hq k{A.) ^ 11^ jt(A); Section 2.2 discusses the computation of H^-, jfc(A). 

1.2 Our main results 

Since ||A — CC^A||^ < ||A — 11^ ^,(A)||^, we will state all our bounds in terms of the latter quan- 
tity. Note that we chose to state our Frobenius norm bounds in terms of the square of the Frobenius 
norm; this choice facilitates comparisons with prior work and simplifies our proofs. 

Theorem 1 (Deterministic spectral norm reconstruction). Given A G M™xn ^Q^^ik p and a target 
rank k < p, there exists a deterministic polynomial-time algorithm to select r > k columns of A 
and form a matrix C G M"*^*" such that 

l|A-tfc,,(A)||2 < (l + i^pZ^)||A-A,||2 
= 0(7;^) ||A-Afc||2. 

The matrix C can be computed in TsvD + O (^rn (^k"^ + {p— ^)^)) time, where TgvD is the time 
needed to compute all p right singular vectors of A. 

Our algorithm uses the matrices Vfc and Vp_fc of the right singular vectors of A. These matrices 
can be computed in 0(mnmin{m, n}) time via the SVD. The asymptotic multiplicative error of 
the above theorem matches a lower bound that we prove in Section 9.1. This is the first spectral 
reconstruction algorithm with asymptotically optimal guarantees for arbitrary r > k. Previous 
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work presented near-optimal algorithms for r = k [17]. We note that in Section 4 we will present 
a result that achieves a slightly worse error bound (essentially replacing /> by n in the accuracy 
guarantee), but only uses the top k right singular vectors of A (i.e., the matrix V^.). 

Theorem 2 (Deterministic Probenius norm reconstruction). Given A G ]^"ix"- qJ rank p and a 
target rank k < p, there exists a deterministic polynomial-time algorithm to select r > k columns 
of A and form a matrix C G R"*^'' such that 

l|A-n£,.(A)|||<(^l + ^--i^j ||A-A,|||. 

The matrix C can be computed in T^^^ + O {mn + nrk^) time, where Ty^ is the time needed to 
compute the top k right singular vectors of A. 

Our bound implies a constant-factor approximation. Previous work presents deterministic near- 
optimal algorithms for r = fc [6]; we are unaware of any deterministic algorithms for r > k. 

The next two theorems guarantee (up to small constant factors) the same bounds as Theorems 1 
and 2, but the proposed algorithms are considerably more efficient. In particular, there is no need 
to exactly compute the right singular vectors of A, because approximations suffice. 

Theorem 3 (Fast spectral norm reconstruction). Given A G j^mxn ^g^j^j^ ^ target rank 
2 < k < p, and < e < 1, there exists a randomized algorithm to select r > k columns of A and 
form a matrix C G R"*^'' such that 

E[||A-n|,,,(A)||,] < (V2-Fe)(l + i±^)||A-A,||, 

= O(VV^) Afc||2. 

The matrix C can be computed in O {mnke~^ log {k^^ min{m,n}) -|- nrk^^ time. 

Theorem 4 (Fast Frobenius norm reconstruction). Given A G R"*^" of rank p, a target rank 
2 < k < p, and < e < 1, there exists a randomized algorithm to select r > k columns of A and 
form a matrix C G R"*^'' such that 

E [||A - n^,,(A)|||] <{l + e)(^l+ II A - A,|||. 

The matrix C can be computed in O {mnke"^ + nrk^^ time. 

Our last, yet perhaps most interesting result, guarantees relative-error Frobenius norm approx- 
imation by combining the algorithm of Theorem 4 with one round of adaptive sampling [8, 9]. 
This is the first relative-error approximation for Probenius norm reconstruction that uses a linear 
number of columns in k (the target rank). Previous work [12, 23, 8, 6] achieves relative error with 
0(A;logfc -|- k/e) columns. Our result asymptotically matches the Q.{k/e) lower bound in [8]. 

Theorem 5 (Fast relative-error Frobenius norm reconstruction). Given A G R™-^" of rank p, a 
target rank 2 < k < p, and < e < 1, there exists a randomized algorithm to select at most 

_2k 

€ 

columns of A and form a matrix C G R"^^'' such that, 



E[||A-n^,,(A)|||] <(l + e)|[A-A,|||. 
The matrix C can be computed in O [(mnk + nk^) e~^/^) time. 



3 



r 


Spectral norm = 2) 


Frobenius norm = F) 


r = k 


n/k [6] 


k + l[9] 


r > k 


n/r (Section 9.1) 


1 + k/r [8] (and Section 9.2) 



Table 1: Lower bounds for the approximation ratio ||A — n|, ^(A)|| /||A — Aj. 



Running times. Our running times arc stated in terms of the number of operations needed to 
compute the matrix C, and, for simplicity, we assume that A is dense; if A is sparse, additional 
savings might be possible. Our accuracy guarantees are in terms of the optimal matrix ^(A), 
which would require additional time to compute. For the Frobenius norm, computing j^{A) 
is straightforward, and only requires an additional O {mnr + (m + n) r^) time (see the discussion 
in Section 2.2). For the spectral norm, we are not aware of any algorithm to compute 11^ ^,(^) 
exactly. In Section 2.2 we present a simple approach that computes nQ^.(A), a constant-factor 
approximation to ITq ^(A), in O {mnr + (m + n) r^) time. Our bounds in Theorems 1 and 3 can 

be restated in terms of the error ||A — 11^ ^(A)||^; the accuracy guarantees only weaken by small 
constant factors. 



1.3 Lower Bounds 

Table 1 provides a summary on lower bounds for the ratio 

l|A-4,fe(A)ll| 
l|A-A,||| ' 

where C is a matrix consisting of r columns of A, with r > k. Theorem 34 contributes a new 
lower bound for the spectral norm case when r > k. Note that any lower bound for the ratio 
||A — CC+A|||/||A — Afe||| implies the same lower bound for ||A — 11^ j^.(A)|||/|| A — Ajfc|||; the 
converse, however, is not true. 



1.4 Prior results on column-based matrix reconstructions 

There is a long literature on algorithms for column-based matrix reconstruction using r > k 
columns. The first result goes back to [16], with the most recent one being, to the best of our 
knowledge, the work in [6]. 



1.4.1 The Frobenius norm case 

We present known upper bounds for the approximation ratio ||A — 11^ ^(A)|||/|| A — Afe|||. We 
start with the r = k case. [3] describes a Tv^. + O [nk + k^ (log^ A;) (log log fe)) time randomized 
algorithm which provides an upper bound O (ji log2 k^ with constant probability. This bound was 

subsequently improved in [6]. More precisely. Theorem 8 of [6] gives a (A; + 1) deterministic approx- 
imation running in 0{knm^ logm) time; this upper bound matches the lower bound in [9]. [6] also 

presents three randomized algorithms such that E ||A — 11^ ^(A)||p = {k + 1)||A — Afc||p. These 
randomized algorithms are presented in Theorem 7, Proposition 16, and Proposition 18 and run 

in O (A;nm'^ log m) , O {kn^m + kn^ log rij , and O {kTsvD ^ knm?^ time, respectively. Moreover, 
Theorem 9 in [6] presents an O [mn log nk'^e~^ + n log^ n ■ k'^e^^ log (fce^^ log n)) time randomized 
algorithm such that, with constant probability, ||A — 11^ ^(A)||| < (1 -|- e) • (A: -|- 1)||A — Afe|||, for 
any < e < 1. Finally, [18] improved upon the running time of the results in [6]. More precisely. 
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Theorem 2 in [18] gives an O {knrn^^ time randomized algorithm with a {k + 1) multiphcative error 

(in expectation). 

When r = Q.{k\ogk)^ relative-error approximations are known. [12] presented the first result 
that achieved such a bound, using random sampling of the columns of A according to the Eu- 
clidean norms of the rows of V^. More specifically, a (1 -|- e)-approximation was proven using 
r = 17 (/ce"^ log (A;e~^)) columns in Tv^. + 0(kn + rlogr) time. [23] argued that the same tech- 
nique gives a (1 + e)-approximation using r = (/clog A; -|- ker^^ columns. It also showed how to 
improve the running time to Ty^ + 0{kn + rlogr), where Vfc G W^^^ contains the right singular 
vectors of an approximation to and can be computed in o(mnmin{m, n}) time, which is less 
than the time needed o compute the SVD of A. In [8], the authors leveraged volume sampling 
and presented an approach that achieves a relative error approximation using 0(A;^logA; + ke^^) 
columns in 0{mnk'^\ogk) time. Also, it is possible to combine the fast volume sampling approach 
in [6] (setting, for example, e = 1/2) with O(logfc) rounds of adaptive sampling as described in [8] 
to achieve a relative error approximation using O (/clog /c -|- /ce~^) columns. The running time of 
this combined algorithm is O {rnnk"^ log n + nk^ log'^ n ■ log [k log n)) . The techniques in [12] do not 
apply to general r > k, since ^l{klogk) columns must be sampled in order to preserve rank with 
random sampling. 

A related Hue of work (including [7, 13, 14, 24]) has focused on the construction of coresets 
and sketches for high dimensional subspace approximation with respect to general ip norms. In 
our setting, p = 2 corresponds to Frobenius norm matrix reconstruction, and Theorem 1.3 of [24] 
presents an exponential in k/e algorithm to select O (/c^e~^ log (A;/e)) columns that guarantee a 
relative error approximation. It would be interesting to understand if the techniques of [7, 13, 14, 24] 
can be extended to match our results here in the special case of p = 2. 

The recent work in [18] presents a deterministic and a randomized algorithm for arbitrary r > k 
that guarantee upper bounds for the ratio ||A — CC''"A||p/|| A — Afc|||. More precisely. Theorem 
1 in [18] presents an O (rnm^ log m) time deterministic algorithm with bound {r + l)/{r + 1 — k), 
which is tight up to low order terms if r = o(n). Also, Theorem 2 in [18] presents an 0{rnm?') 
time randomized algorithm which achieves the same bound in expectation. We should notice that 
it is not obvious how to extend the results in [18] to obtain comparable bounds for the ratio 
||A-ng^,(A)|||/||A-A,|||. 

1.4.2 The spectral norm case 

We present known guarantees for the approximation ratio || A — 11^ ^(A)||2/|| A — Afc||2. In general, 
results for spectral norm have been sparse. When r = k, the strongest bound emerges from Strong 
Rank Revealing QR (RRQR) [17] (specifically Algorithm 4 in [17]), which, for / > 1, runs in 
0{mnk\ogj- n) time and guarantees an /^/c(n — /c) + 1 approximation. For r > k, to the best of 
our knowledge, there is no easy way to extend the RRQR guarantees. In fact we are only aware of 
one bound that is applicable to this domain, other than those obtained by trivially extending the 
Frobenius norm bounds, because any a-approximation in the Frobenius norm gives an a(p — k)- 
approximation in the spectral norm: 

II A - n2,^,(A)||i < ||A - n^,,(A)||i < II A - n^,fc(A)||2 < a||A - Afc||| < a{p - k)\\A - A^Wl 

That exception is the recent work of [2], which describes a deterministic Tv^. + O {nk (n — r)) time 
algorithm that guarantees approximation error 2 -|- k{n — r)/{r — k + 1) for any r > k. 
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2 Matrix norm properties and the computation of 11^^ (A) 

2.1 Matrix norm properties 

Recall notation from Section 1.1; for any matrix A of rank at most p, it is well-known that 
||A||| = J2i=i^ii-^) ^^"^ ll-'^lb = '7i(A). Also, the best rank k approximation to A satisfies 
II A — Afc||2 = <Jfc_|_i(A) and || A — Afc||| = J2i=k+i "^f ("^)- ^'^^ matrices A and B of ap- 

propriate dimensions, ||A||2 < ||A||f < •y/p||A||2, ||AB||f < || A||f||B||2, and ||AB||f < ||A||2||B||f. 
The latter two properties are stronger versions of the standard submultiplicativity property. 
We refer to the next lemma as matrix-Pythogoras: 

Lemma 6. //X, Y G M"*^" and XY^ = Omxm or X^Y = 0„xn, then 

||X + Y||2 = ||X||2 +||Y|||, 
ma^{||X||i, II Y||i} < ||X + Y||2 < ||X||i + ||Y||2. 

Proof. Since XY^ = Omxm, (X + Y)(X + Y)^ = XX^ + YY^. For ^ = F, 

||X + Y\\l = Tr ((X + Y)(X + Y)^) = TV (XX"^ + YY'^) = ||X||| + ||Y|||. 
Let z be any vector in M™. For ^ = 2, 

||X + Y||i= max z'^(X + Y)(X + Y)'^z = max (z'^XX^z + z'^YY'^z) . 

|z||2 = l l|z||2 = l 

We have that max||2||2=i (z'^XX"'"z + z"'"YY"'"z) is at most 



and that 



max z'^XX'^z+ max z'^YY'^z = ||X||^ + ||Y| 

I|Z||2 = 1 ||Z|12 = 1 



max (z^XX'^z + z'^YY'^z) > max z^XX'^z = ||X||i, 



l|z||2 = l l|z||2 = l 

since z-'^YY^'^z is non-negative for any vector z. We get the same lower bound with ||Y||2 instead, 
which means we can lower bound with max{||X||2, ||Y|||}. The case with X'^Y = On xn can be 
proven similarly. ■ 

2.2 Computing the best rank k approximation 11^^ (A) 

Let A G M"^><", let A; < n be an integer, and let C G W^'''' with r > k. Recall that n^^^(A) G M""''" 

is the best rank k approximation to A in the column space of C: we can write 11^^ fe(-A-) = CX^, 
where 

X^ = argmin ||A - C*|||. 
*eK''^":rank(*)<fc 

In order to compute (or approximate) ^.(A) given A, C, and k, we will use the following 
algorithm: 

1: Orthonormalize the columns of C in 0{mr'^) time to construct the matrix Q G W^^^ . 

2: Compute (Q^A)^ G M^^" via SVD in 0{mnr + nr^) - the best rank-A; approximation of Q"'"A. 

3: Return Q (Q'^A)^ G R"'''" in 0{mnk) time. 

Clearly, Q(QT A) ^ is a rank k matrix that lies in the column span of C. Note that though 11^ ^(A) 
can depend on ^, our algorithm computes the same matrix, independent of ^. The next lemma, 
which is essentially Lemma 4.3 in [5] together with a slight improvmcnt of Theorem 9.3 in [19], 
proves that this algorithm computes Hq j^iA.) and a constant factor approximation to Hq /j(A). 
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Lemma 7. Given A G M"*^", C G R"*'''' and an integer k, the matrix Q (Q'^A)^ G M""''" 
described above (where Q zs an orthonormal basis for the columns of C) can be computed in 
O (mnr + (m + n)r^) time and satisfies: 

||A-Q(QTA),g = ||A-n^,,(A)|||, 
||A-Q(qTa)J|^ < 2||A-tfc,(A)||i. 

Proof. Our proof for the Frobenius norm case is a mild modification of the proof of Lemma 4.3 [5]. 
First, note that 11^ fe(A) = IIq ^^.(A), because Q G W^^'^ is an orthonormal basis for the column 
space of C. Thus, 

||A - (A)||2 = ||A - (A)||2 = min ||A - Q*||2 . 

*:rank(*)<fc 



Now, using matrix-Pythagoras and the orthonormality of Q, 

||A - Q*||| = ||A - QQ'^A + Q(Q^A - *)||| = ||A - QQ^A||| + HQ^A - *|||. 

Setting ^ = (Q^A)jt minimizes the above quantity over all rank-fe matrices ^ . Thus, combining 
the above results, ||A - Ii^^^^{A)\\l = ||A - Q (Q'^A)^ |||. 

We now proceed to the spectral-norm part of the proof, which combines ideas from Theorem 
9.3 [19] and matrix-Pythagoras. We first manipulate the term ||A — Q (Q"'^A)j, |||, 

||A-Q(QTA)^||i = ||A-QQTA + Q(QTA-(QTA)fc)||2 

< ||A-QQTA||2 + ||QQTA-(QQTA)fc||i 

< ||A-n2Q^,(A)||2 + ||A-Afe||2 

< 2||A-n2j_,(A)||i. 



The first inequality follows from the simple fact that (QQ"'"A)^ = Q (Q"'"A)^ and matrix-Pythagoras; 
the first term in (a) follows because QQ^A is the (unconstrained, not necessarily of rank at most 
k) best approximation to A in the column space of Q; the second term in (a) follows because QQ^ 
is a projector matrix and thus 

IIQQ^A - (QQTA)fclli = a2+i(QQTA) < al^^{A) = ||A - Ak\\l 

The last inequality follows because ||A — Afe||| < ||A — Eg j.(A)|||. ■ 



3 Main Tools 

Our two main tools are the use of matrix factorizations for column-based low-rank matrix recon- 
struction, and two deterministic sparsification lemmas which extend the work of [1]. 



3.1 Matrix factorizations 

Our first tool suggests how to use matrix factorizations to reconstruct a matrix from a subset of its 
columns: Lemmas 8, 10, and 11. Lemmas 10 and 11 present factorizations of the matrix A G R"*^" 
of the form 

A = BZ^-FE, 

where B G MJ"^^, Z G M"^*', E G R'"^'^, and Z consists of orthonormal columns. Lemma 8 
shows how to apply these factorizations by drawing a connection between matrix factorizations 
and column selection. Lemma 8 is the starting point of all our column reconstruction results. 
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Lemma 8. Let A = BZ""" + E, with EZ = Omxfc and T?Tx = 1^. Let S G R"^^ be any matrix such 
that rank{Z^S) = rank{Z) = fe. Let C = AS G M"*^''. T/ten, 

||A - 4,fc(A)ll' < l|E||| + ||ES(ZTS)+||^'. 

Proof. The optimality of n^^,(A) implies that ||A - n^^,(A)||| < ||A - X||| over aU matrices 

X € M™^"- of rank at most k in the column space of C. Consider the matrix X = C (Z"'"S)^ Z"^ 
(clearly X is in the column space of C and rank(X) < k because Z G R"^*^): 

||A-C(Z^S)+Z^||| = ||BZ^ + (A-BZ^)-(BZ^ + E)S(Z^S)+Z'^||| 



(a) 



A C=AS 

IBZ^ - BZTS(ZTS)+ZT + E + ES(ZTS)+ZT||| 
|E + ES(Z^S)+Z' 



(6) 

< ||E||| + ||ES(Z'^S)+Z'^|||. 

(a) follows because, by assumption, rank(Z^S) = k, and thus (Z^S)(Z"'"S)"'" = Ifc which implies 
BZ'^ - B(ZTS)(ZTS)+Z'^ = Omxn- (b) follows by matrix-Pythagoras because ES(ZTS)+Z'^E'^ = 
Omxn (recall that E = A — BZ""" and EZ = Omxk by assumption). The lemma follows by strong 
submultiplicativity because Z has orthonormal columns, hence ||Z||2 = 1. ■ 

In this work, we view C as a dinicnsionally-rcduced or sampled sketch of A; S is the dimension- 
reduction or sampling matrix. In words, Lemma 8 argues that if the matrix S preserves the rank of 
an approximate factorization of the original matrix A, then the reconstruction of A from C = AS 
has an error that is essentially proportional to the error of the approximate factorization. The 
importance of this lemma is that it indicates an algorithm for matrix reconstruction using a subset 
of the columns of A: first, compute any factorization of the form A = BZ^ + E satisfying the 
assumptions of the lemma; then, compute a sampling matrix S which satisfies the rank assumption 
and controls the error ||ES(Z"'"S)"'"||^. 

An immediate corollary of Lemma 8 emerges by considering the SVD of A. More specifically, 
consider the following factorization of A: A = AV^V^ + (A — A^), where is the matrix of the 
top k right singular vectors of A. In the parlance of Lemma 8, Z = V^, B = AV^, E = A — A^, 
and clearly EZ = O^xfe- 

Lemma 9. Let S G R"^'' be a matrix such that rank(VlS) = k. Let C = AS; then, 

||A - 4,,(A)||^' < ||A - Afelll + ||(A - A,)S(ylS)% 

The above lemma will be useful for designing the deterministic (spectral norm and Frobenius norm) 
column-reconstruction algorithms of Theorems 1 and 2. However, computing the SVD is costly and 
thus we would like to design a factorization of the form A = BZ""" + E that is as good as the SVD, 
but can be computed in 0{mnk) time. The next two lemmas achieve this goal by extending the 
algorithms in [19, 22] (see Sections 5 and 6 for their proofs). We will use these factorizations to 
design fast column reconstruction algorithms in Theorems 3, 4, and 5. 

Lemma 10 (Randomized fast spectral norm SVD). Given A G R"*^" of rank p, a target rank 
2 < k < p, and < e < 1, there exists an algorithm that computes a factorization A = BZ"^ -|- E, 
with B = AZ, Z"'"Z = Ifc, and EZ = O^xik such that 

E[||E||2] < (V2 + e) ||A-Afc||2. 

The proposed algorithm runs in O {mnke~^ log {k~^ min{m, n})) time. 
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Lemma 11 (Randomized fast Probenius norm SVD). Given A G M"*^" of rank p, a target rank 
2 < k < p, and < e < 1, there exists an algorithm that computes a factorization A = BZ""" + E, 
with B = AZ, Z'^Z = I/j, and EZ = Omxfc such that 

E[||E|||] <(l + e)||A-Afe|||. 

The proposed algorithm runs in O (^mnke~^) time. 

3.2 Sparse approximate decompositions of the identity 

Lemmas 8, 10 and 11 argue that, in order to achieve almost optimal column-based matrix re- 
construction, we need a sampling matrix S that preserves the rank of Z and controls the error 
||ES(Z"^S)+||^. We present algorithms to compute such a matrix S in Lemmas 12 and 13. These 
lemmas were motivated by an important linear-algebraic result for a decomposition of the identity 
presented by Batson et al. [1]. It is worth emphasizing that the result of [1] can not be directly 
applied to the column reconstruction problem. Indeed, in our setting, it is necessary to control 
properties related to both matrices Z and E = A — BZ^ simultaneously. In the spectral-norm re- 
construction case, we need to control the singular values of the two matrices; in the Frobenius-norm 
reconstruction case, we need to control singular values and Frobenius norms of two matrices. The 
following two lemmas are proven in Sections 7 and 8. 

Lemma 12 (Dual Set Spectral Sparsification.). Let V = {vi,...,v„} and U = {ui,...,u„} he 
two equal cardinality decompositions of the identity, where Vj G M'^ (k < n), Ui ^ < n), 

Y^^=i^i'^l — Ifej o^f^d Y^^=i'^i'^ — ^t- Given an integer r with k < r < n, there exists a set of 
weights Sj > (i = 1, . . . ,n) at most r of which are non-zero, such that 

and 




The weights Si can he computed deterministically in O {rn {k"^ + time. 

In matrix notation, let U and V be the matrices whose rows are the vectors Uj and Vj respectively. 
We can now construct the sampling matrix S G R"^^ as follows: for z = 1, . . . , n, if is non-zero 
then include y^ej as a column of S\ here Cj is the i-th standard basis vector ^. Using this matrix 
notation, the above lemma guarantees that cr^ (V^S) > 1 — \fkjr and a\ (U"'"S) < 1 + \/TJr. 
Clearly, S may be viewed as a matrix that samples and rescales r rows of U and V (columns of 
U"^ and V^), namely the rows that correspond to non-zero weights Si. 

Lemma 13 (Dual Set Spectral- Frobenius Sparsification.). Let V = {vi, . . . , v„} he a decomposition 
of the identity, where Vj G M'^ (k < n) and Y17=i — ^k! let A = {ai, . . . , a„} be an arbitrary 
set of vectors, where a^ G M^. Then, given an integer r such that k < r < n, there exists a set of 
weights Sj > (i = 1 . . . n), at most r of which are non-zero, such that 

h[y ] SiVivJ I > [1- J-] , and 



TV ^i^i^I ^ Tr ^a^a^^ = ^ ||a,||i. 




^Note that we slightly abused notation: indeed, the number of columns of S is less than or equal to r, since at 
most r of the weights arc non-zero. Here, we use r to also denote the actual number of non-zero weights, which is 
equal to the number of columns of the matrix S. 



9 



The weights Si can be computed deterministically in O (rnk"^ + n£) time. 

In matrix notation (here A denotes the matrix whose rows are the vectors cij), the above lemma 
guarantees that ak (V^'^S) > 1 — ^/kjr and ||A'^S||| < ||A|||. 

4 Proofs of our Main Results 

In this section, we leverage the main tools described in Section 3 in order to prove the results of 
Section 1.2 (Theorems 1 through 5). We start with a proof of Theorem 1, using Lemmas 9 and 12. 

Proof of Theorem 1. Apply the algorithm of Lemma 12 on the following two sets of vectors: 
the n rows of the matrix and the n rows of the matrix Vp_fc. The output of the algorithm 
is a sampling and rescaling matrix S G M"^'' (see discussion after Lemma 12 in Section 3.2). Let 
C = AS and note that C consists of a subset of r resettled columns of A. Lemma 12 guarantees 
that ak(ylS) > 1 - ^/kjr > (assuming r > k), and so rank(V^S) = k. Also, c7i(Vj_;;.S) = 

ll^J-fc^llo ^ 1 + ViP ~ k)/r. Applying Lemma 9, we get 



II A Tt2 / a M|2 ^ II A A l|2 I 11/ A A ^C /'■\7-Tc!\+ l|^ 



|A-n2 (A)lli < ||A-Afe||i + ||(A-A,)S(vTs)+"2 



< IIA - Ad'i + \\(A- Afc)S||2||(vTs)+||2 




u,_fcS,_feVj_,s||i||(yTs)+||2 



< IIA - A,||^ + ||S,_fe||i||Vj_,S||2||(vTs) 



< ||A-A.mi+(^ + V(^) 



(1 - ^/k/^)^ J 

where the last inequality follows because ||Sp_fe||2 = ||A — Afc||2 and ||(V^S)"'"||2 = l/(Tfe(V^S) < 
1/ ^1 — \/k/r^ . Theorem 1 now follows by taking square roots of both sides and using ^/l + < 
1 + X. The running time is equal to the time needed to compute and Vp_fc plus the running 
time of the algorithm in Lemma 12. Finally, we note that the rescaling of the columns of C does 
not change the span of its columns and thus is irrelevant in the construction of 11^ ^^(A). ■ 

Our next theorem describes a deterministic algorithm for spectral norm reconstruction that only 
needs to compute and will serve as a prequel to the proof of Theorem 3. The accuracy guarantee 
of this theorem is essentially identical to the one in Theorem 1, with p — k being replaced by n. 

Theorem 14. Given A G j^mx"- of rttnk p and tt tttrget rank k < p, there exists a deterministic 
polynomial-time algorithm to select r > k columns of A and form tt matrix C G jR^x*" such that 



The matrix C mn be computed in Ty^ + O (nr/c^) time, where Ty^, is the time needed to compute 

the top k right singular vectors of A. 

Proof. The proof is very similar to the proof of Theorem 1, so we only highlight the differences. 
First, apply the algorithm of Lemma 12 on the following two sets of vectors: the n rows of the 
matrix and the n rows of the matrix I„. The output of the algorithm is a sampling and rescaling 

matrix S G M"'^'' (see discussion after Lemma 12 in Section 3.2). Let C = AS and note that C 
consists of a subset of r resettled, columns of A. Lemma 12 guarantees that ||I„S||2 < 1 + \/n/r. 

We now replicate the proof of Theorem 1 up to the point where ||(A — Afc)S(VjS)"'"||2 is bounded. 
We continue as follows: 

||(A-A,)S(V^S)+||^ = ||(A-A,)I„S(VTS)+||^ 

< ||(A-Afe)i||I,S||i||(VTs)+||^. 
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Again, as in Theorem 1, the rescahng of the columns of C is irrelevant to the construction of 
nQ^(A). To analyze the running time of the proposed algorithm, we need to look more closely 
at Lemma 12 and the related algorithm. The proof of this Lemma in Section 7 argues that the 
algorithm of Lemma 12 can be implemented in O(nrfe^) time. The total running time is the time 
needed to compute plus 0{nrk^). ■ 

Proof of Theorem 3. In order to prove Theorem 3 we will follow the proof of Theorem 1 using 
Lemma 10 (a fast matrix factorization) instead of Lemma 9 (the exact SVD of A). More specifically, 
instead of using the top k right singular vectors of A (the matrix V^), we use the matrix Z G M"^^ 
of Lemma 10. We now apply the algorithm of Lemma 12 on the following two sets of vectors: the 
n rows of the matrix Z and the n rows of the matrix I„. The output of the algorithm is a sampling 
and rescaling matrix S G M"^'' (see discussion after Lemma 12 in ection 3.2). Let C = AS and note 
that C consists of a subset of r resettled columns of A. The proof of Theorem 3 is now identical 
to the proof of Theorem 14, except for using Lemma 8 instead of Lemma 9 in the first step of the 
proof: 

||A-n2c,fe(A)||i < 
< 

where E is the residual error from the matrix factorization of Lemma 10. Taking square roots (using 
■s/l + < 1 + x) and using the bounds guaranteed by Lemma 12 for ||InS||2 and ||(Z'^S)''"||2, we 
obtain a bound in terms of ||E||2. Finally, since E is a random variable, taking expectations and 
applying the bound of Lemma 10 concludes the proof of the theorem. Again, the rescaling of the 
columns of C is irrelevant to the construction of 11^ ^^.(A). The running time is the time needed to 
compute the matrix Z from Lemma 10 plus an additional 0{nrk'^) time as in Theorem 14. ■ 

Proof of Theorem 2. First, apply the algorithm of Lemma 13 on the following two sets of 
vectors: the n rows of the matrix and the n rows of the matrix (A — A^)"*". The output of 
the algorithm is a sampling and rescaling matrix S G M"^*' (see discussion after Lemma 12 in 
Section 3.2). Let C = AS and note that C consists of a subset of r resettled columns of A. We 
follow the proof of Theorem 1 in the previous section up to the point where we need to bound the 
term ||(A — Afe)S(V^S)'^||p. By strong submultiplicativity, 

||(A - Afc)S(V^S)+||2 < ||(A - A,)S||| ||(V^S)+||i. 

To conclude, we apply Lemma 13 to bound the two terms in the right-hand side of the above 
inequality. Again, the rescaling of the columns of C is irrelevant to the construction of 11^ ^(A). 
The running time of the proposed algorithm is equal to the time needed to compute Vfc plus the 
time needed to compute A — Aj. (which is equal to 0{mnk) given Vj^) plus the time needed to run 
the algorithm of Lemma 13, which is equal to O {nrk'^ + nm). ■ 

Proof of Theorem 4. We will follow the proof of Theorem 2, but, as with the proof of Theorem 3, 
instead of using the top k left singular vectors of A (the matrix Vfc), we will use the matrix Z 
of Lemma 11 that is computed via a fast, approximate matrix factorization. More specifically, let 
Z be the matrix of Lemma 11 and run the algorithm of Lemma 13 on the following two sets of 
vectors: the n rows of the matrix Z and the n rows of the matrix E""". The output of the algorithm 
is a sampling and rescaling matrix S G M"^*" (see discussion after Lemma 12 in Section 3.2). Let 
C = AS and note that C consists of a subset of r resettled columns of A. The proof of Theorem 4 is 
now identical to the proof of Theorem 2, except for using Lemma 8 instead of Lemma 9. Ultimately, 



E||i + ||ES(Z^S)+||i 
E||i + ||EI„S(ZTS)+||2 
E||i(l + ||I„S||i||(zTs)+||i), 
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we obtain 



|A-ng.(A] 



II 



< 
< 

< 



|E||| + ||ES(Z^S) 



|E||| + ||ES|||| 



fzTs)+||2 



(^1 + (l - 



|E|||. 



The last inequality follows from the bounds of Lemma 13. The theorem now follows by taking 
the expectation of both sides and using Lemma 11 to bound E [||E|||] . Again, the rescaling of the 
columns of C is irrelevant to the construction of ITq ^(A). The overall running time is derived by 
replacing the time needed to compute in Theorem 2 with the time needed to compute the fast 
approximate factorization of Lemma 11. ■ 

Proof of Theorem 5. Finally, we will prove Theorem 5 by combining the results of Theorem 4 
(a constant factor approximation algorithm) with one round of adaptive sampling. We first recall 
the following lemma, which has appeared in prior work [10, 15]. 

Lemma 15. Given a matrix A G W^^"^, a target rank k, and an integer r, there exists an algorithm 
to select r columns from A to form the matrix C G W^^'^ such that 



E[||A-n£,,(A)|||] <||A-Afc||2 + 



k. 



A\\l 



The matrix C can be computed in 0{mn + rlogr) time. 

Algorithms for the above lemma choose r columns of A in r independent identically distributed 
(i.i.d.) trials, where in each trial a column of A is sampled with probability proportional to its norm- 
squared (importance sampling). We now state Theorem 2.1 of [9], which builds upon Lemma 15 
to provide an adaptive sampling procedure that improves the accuracy guarantees of Lemma 15. 

Lemma 16. Given a matrix A G R*"^"-, let Ci G j^^x*" consist of r columns of A, and define the 
residual B = A - CiC^A G M'"^'*. For i = 1, . . . ,n, let 



Pi 



l^iWl/ 



where bj is the i-th column of the matrix B. Sample a further s columns from A in s i.i.d. trials, 
where in each trial the i-th column is chosen with probability pi. Let C2 G M™^^* contain the s 
sampled columns and let C = [Ci C2] G (''+*) contain the columns of both Ci and C2, all of 
which are columns of A. Then, for any integer k > 0, 



E 



|A-ng.(A)||; 



< II A • 



Afclll + -||B||^. 



Note that Lemma 16 is an extension of Lemma 15, which can be derived by setting Ci to be 
empty in Lemma 16. We are now ready to prove Theorem 5. First, fix d > 1 and define cq = 
(1 + Co) (1 + 1/(1 — \/A;/f)^), where r = \ dk]. (We will choose d and eo later.) Now run the 
algorithm of Theorem 4 to sample f = \ dk~\ columns of A and form the matrix Ci. Then, 
run the adaptive sampling algorithm of Lemma 16 with B = A — CiCj'^A and sample a further 
s = [cofc/e] columns of A to form the matrix C2. Let C = [Ci C2] G contain aU 



the sampled columns. We will analyze the expectation E || A — 11^ ^^.(A 



Lemma 16, we first compute the expectation with respect to C2 conditioned on Ci: 



Using the bound of 



Ec2 



|A-n^,,(A) 



Ci 



< 



|A- Afe||| + -||B|||. 
s 
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We now compute the expectation with respect to Ci (only B depends on Ci): 



Ec, [Ec, [||A - n£^^(A)||2 I Ci]] < ||A - Afelll + ^Ec, [||A - CiC+A|||] . 

By the law of iterated expectation, the left hand side is exactly equal to E ||A — 11^ ^(A)||| . We 
now use the accuracy guarantee of Theorem 4 and our definition of cq to bound 

Ec, [||A - CiC+A|||] < Ec, [||A - n^^^,(A)|||] < co||A - A,\\l 

Using the bound in (4), we obtain 

E [II A - ng,fc(A)|||] < ||A - Afclll (1 + cok/s) . 

Finally, recall that for our choice of s, s > cok/e, and so we obtain the relative error bound. The 
number of columns needed is r = f + s = dk + cok/e. Set d = (1 + a)^, where a = ^(1 + eo)/e. 
After some algebra, this yields r = k{a^ + (1 + ")^) = ^(1 + 0(eo + e^/^)) sampled columns. The 
time needed to compute the matrix C is the sum of three terms: the running time of Theorem 4 
(which is 0{mnkeQ^ + nrk"^)), plus the time needed to compute A — CiCj^A (which is 0{mnf)), 
plus the time needed to run the algorithm of Lemma 16 (which is 0{mn + s log s)). Assume r < n 
(otherwise the problem is trivial), set eo = e^/^ and use d = 0(e~^/^) to get the final asymptotic 
run time. ■ 
We conclude by noting that the number of columns required for relative error approximation is 
approximately a two- factor from optimal, since | columns are necessary (see [8] and Section 9.1). 
We get an improved running time equal to O [rank + nk^ + nloge"^) using just a constant factor 
more columns by setting d and eo in the proof to constants (for example, setting d = 100, cq = 
^ K, i results in sampling ^(1 + o(l)) columns). 



5 Proof of Lemma 10: Approximate SVD in the Spectral Norm 

Consider the following algorithm, described in Corollary 10.10 of [19]. The algorithm takes as 
inputs a matrix A G ]^™x" of rank an integer 2 < /e < p, an integer (/ > 1, and an integer p >2. 
Set r = k + p and construct the matrix Y G M."^^^ as follows: 

1. Generate an n x r standard Gaussian matrix R whose entries are i.i.d. AA(0, 1) variables. 

2. Return Y = (AA'^)5AR € M"'^^ 

The running time of the above algorithm is 0{mnrq). Corollary 10.10 of [19] presents the following 
bound: 



E[||A-YY+A||2] < (^1 + y^jy + '''^^ Vmin{m,w} - k^ ||A-Afc||2, 

where e = 2.718 .... To the best of our understanding, the above result is not immediately applicable 
to the construction of a factorization of the form A = BZ^ + E, because Y contains r > k 
columns. Lemma 25 below, which strengthens Corollary 10.10 in [19], argues that the matrix 
IIy fc(A) "contains" the desired factorization BZ^. Recall that while we cannot compute IIy ^(A) 
efficiently, we can compute a constant-factor approximation, which is sufficient for our purposes. 
The proof of Lemma 25 is very similar to the proof of Corollary 10.10 of [19] , with the only difference 
being our starting point: instead of using Theorem 9.1 of [19] we use Lemma 9 of our work. To 
prove Lemma 25, we will need several results for standard Gaussian matrices, projection matrices, 
and Holder's inequality. The following seven lemmas are all borrowed from [19]. 
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Lemma 17 (Proposition 10.1 in [19]). Fix matrices X, Y, and draw a standard Gaussian matrix 
R of appropriate dimensions. Then, 

E [IIXRYII2] < ||X||2||Y||f + ||X||f||Y||2. 

Lemma 18 (Proposition 10.2 in [19]). For k,p > 2, draw a standard Gaussian matrix R G 
Rkxik+p)_ T/ien, 

E[M2]<^^, 

where e = 2.718. . .. 

Lemma 19 (Proposition 10.1 in [19]). Fix matrices X, Y, and a standard Gaussian matrix R of 
appropriate dimensions. Then, 

E [||XRY|||] = ||X|||||y|||. 
Lemma 20 (Proposition 10.2 in [19]). For k,p > 2, draw a standard Gaussian matrix R G 

^[IIR^IIf] = ^- 

Lemma 21 (proved in [19]). For integers k,p > 1, and a standard Gaussian R G m'^^^C'^+p) the 
rank of R is equal to k with probability one. 

Lemma 22 (Proposition 8.6 in [19]). Let P be a projection matrix. For any matrix X of appropriate 
dimensions and an integer q >0, 

IIPXII2 < (||P(XXT)9X||2)^ 
Lemma 23 (Holder's inequality). Let x be a positive random variable. Then, for any h > 1, 



E 



[x] < (e [x^])'' ■ 



The following lemma provides an alternative definition for 11^ fc(-^) which will be useful in subse- 
quent proofs. Recall from Section 2.2 that we can write 11^ ^(A) = CX^, where 

X^ = argmin ||A - C^'|||. 
*eR'-x":rank{«')<fc 

The next lemma basically says that n(-,^(A) is the projection of A onto the rank- A; subspace 
spanned by CX^, and that no other subspace in the column space of C is better. 

Lemma 24. For A G M™^" and C G M™^^ integer r > k, let nj, ^^(A) = CX«, and Y G M^'^" be 
any matrix of rank at most k. Then, 

||A - CX«||| = ||A - (CX«)(CX«)+A||| < ||A - (CY)(CY)+A|||, 

where Y G M''^"' is any matrix of rank at most k. 

Proof. The second inequality will follow from the optimality of X^ because Y(CY)+A has rank 
at most k. So we only need to prove the first equality. Again, by the optimality of X^ and because 
X^(CX^)+A has rank at most k, ||A - CX€||| < ||A - (CX^)(CX^)+A|||. To get the reverse 
inequality, we will use matrix-Pythagoras as follows: 

||A - CX^m = II (l^ - (CX«)(CX^)+) A - CX«(I„ - (CX«)+A)||| 



> II dm 



(CX«)(CX«)+) A|||. 
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Lemma 25 (Extension of Corollary 10.10 of [19]). Let A he a matrix in R"*^" of rank p, let k 
be an integer satisfying 2 < k < p, and let r = k + p for some integer p > 2. Let R G M"^^ be 
a standard Gaussian matrix (i.e., a matrix whose entries are drawn in i.i.d. trials from Af (0,1)). 
Define B = (AA"^)^A and compute Y = BR. Then, for any q> 0, 



1 

2q+l 



E[||A-n2,^,(A)||2] < (^1 + y ^ + Vmm{m, n} - kj \\A-Akh 

Proof. Let ILy ^(A) = YXi and IIy ^(B) = YX2, where Xi is optimal for A and X2 for B. From 
Lemma 24, 

||A - n2.^,(A)||2 = 11(1™ - (YXi)(YXi)+)A||2 < 11(1™ - (YX2)(YX2)+)A||2. 
From Lemma 22 and using the fact that Im — (YX2)(YX2)"'" is a projection, 

1 

A II 2q + l 



(I™ - (YX2)(YX2)+)A||2 < II (I™ - (YX2) (YX2)+) (AA'^y A 
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|B-(YX2)(YX2)+B| 



1 

I 2q+l 
I2 



1 



= ||B-n2.^,(B)|||'+\ 

where the last step follows from Lemma 24. We conclude that 

||A-n2.^,(A)||2<||B-n2._,(B)|||^. 

Y is generated using a random R, so taking expectations and applying Holder's inequality, we get 

E[||A-n2.^,(A)||2] <(E[||B-n2.^,(B)||2])^. (1) 

We now focus on bounding the term on the right-hand side of the above equation. Let the SVD of 
B be B = UbSbVb, with the top rank k factors from the SVD of B being Ub,A;, ^'B,k-> VB,fc 
and the corresponding trailing factors being Ub,t) ^b,t and Vb.t- Let p^ be the rank of B. Let 

J^i = Vg^fcR e M*^^^ and J72 = Vg_^R G M^^^-'^)^''. 

The Gaussian distribution is rotationally invariant, so arc also standard Gaussian matrices 

which are stochastically independent because Vg can be extended to a full rotation. Thus, Vg ^R 
and Vg^R also have entries that are i.i.d. J\f{0, 1) variables. We now apply Lemma 9 to reconstruct 
B, with ^ = 2 and S = R. The rank requirement in Lemma 9 is satisfied because, from Lemma 21, 
the rank of is equal to k (as it is a standard normal matrix), and thus the matrix R satisfies 
the rank assumptions of Lemma 9. We get that, 

||B - (B)||2 < ||B - BkWl + ll(B - Bfc)R(VS R)+||2 < ||Sb,.||^ + \\^B,rn2^t\\l 



Using + < x + y, we conclude that ||B — nYyt(B)||2 < ||Sb,t||2 + ||Sb,t^2^i'||2- We now 
need to take the expectation with respect to J7i, ^^2. We first take the expectation with respect to 
0,2, conditioning on Qi. We then take the expectation w.r.t. Qi. Since only the second term is 
stochastic, using Lemma 17, we have: 

En,[\\^B,T^2nth\0i] < ||SB,r||2||J^^||F+||5]B,r||F||f^^||2. 

Wc now take the expectation with respect to ili. To bound E [||il]^||2], we use Lemma 18. To 

1 /2 

bound E [||riJ|'"||F] , we first use Holder's inequality to bound E [W^+Wf] < E [||J7+||2] , and then 
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we use Lemma 20. Since ||SB,r||F < i^min(m, n) — A;||SB,r||2, collecting our results together, we 
obtain: 



|B-n2 (B) 



< 1 + 



k 



ey/k+p ^— ^ — - 

H y mm(m, n) — k 



p — 1 p 

To conclude, combine with eqn.( 1) and note that ||B — Bfc||2 



IB -B 



k\\2- 



1-^ -^fcll2^ 



We now have all the necessary ingredients to prove Lemma 10. Let Y be the matrix of 
Lemma 25. Set p = k and 



lO] 



g (l + + ^ V^m{m, n}-k 

21og(l + e/V2) - 1/2 



so that 



Then, 



1 + 



P 



H ^/ min{m, n} — k 



1 

2q + l 



P 



< 1 + 



V2- 



E ||A-n^,(A)||2 < 11 + 



lA- A 



k\\2- 



(2) 



Given Y, let Q be an orthonormal basis for its column space. Then, using the algorithm of 
Section 2.2 and applying Lemma 7 we can construct the matrix Q (Q^A)^ such that 

||A - Q (Q^A)^ II2 < V2IIA - n2._,(A)||2. 

Clearly, Q (Q'^A)^ is a rank k matrix; let Z € M"^'^ denote the matrix containing the right singular 
vectors of Q (Q^ A) so Q (Q^ A) = XZ^. Note that Z is equal to the right singular vectors of the 
matrix (Q^A)^ (because Q has orthonormal columns), and so Z has already been computed at the 
second step of the algorithm of Section 2.2. Since E = A-AZZ^ and ||A-AZZ'^||2 < ||A-XZ'^||2 
for any X, we have 

IIEII2 < ||A - Q (Q^A), II2 < V2IIA - n2.^,(A)||2. 

Note that, by construction, EZ = Omxk- The running time follows by adding the running time of 
the algorithm at the beginning of this section and the running time of the algorithm of Lemma 2.2. 



6 Proof of Lemma 11: Approximate SVD in the Frobenius Norm 

Consider the following algorithm, described in Theorem 10.5 of [19]. The algorithm takes as inputs 
a matrix A G R"*^'* of rank p, an integer 2 < k < p, and an integer p > 2. Set r = k + p and 
construct the matrix Y G W^^'^ as follows: 

1. Generate an n x r standard Gaussian matrix R whose entries are i.i.d. M{0, 1) variables. 

2. Return Y = AR G R'"^''. 

The running time of the above algorithm is 0{mnr). Theorem 10.5 in [19] presents the following 
bound: ^ 

E [||A - YY+A||f] < (^1 + ^) ' II A - AfellF. 

To the best of our understanding, the above result is not immediately applicable to the construction 
of a factorization of the form A = BZ""" + E (as in Lemma 11) because Y contains r > k columns. 
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Lemma 26 (Extension of Theorem 10.5 of [19]). Let A he a matrix in M™^" of rank p, let k be an 
integer satisfying 2 < k < p, and let r = k+p for some integer p > 2. Let R € M"^'' be a standard 
Gaussian matrix (i.e., a matrix whose entries are drawn in i.i.d. trials from M (0,1)) and compute 
Y = AR. Then, 

k_ 
p — 1^ 

Proof. Wc construct the matrix Y as described in the beginning of this section. Let the rank of A 
be p and let A = USV^ be the SVD of A. Define 

Oi = V^R G M'^^'' and = Vj.^R G M(''-'=)^^ 



E [||A - .(A)lll] < 1 + ^ ||A - A,|||. 



The Gaussian distribution is rotationahy invariant, so are also standard Gaussian matrices 



which are stochastically independent because V"*" can be extended to a full rotation. Thus, VJ R 



and Vj_^R also have entries that are i.i.d. A/'(0, 1) variables. We now apply Lemma 9 to recon- 
structing A, with = F and S = R. Recall that from Lemma 21, the rank of fii is equal to k, and 
thus the matrix R satisfies the rank assumptions of Lemma 9. We have that, 

||A - n^,fc(A)||| < ||A - Afelll + \\^p_k^2n+\\l, 

where A — A^ = Up_feSp_feVj_^. To conclude, we take the expectation on both sides, and since 
only the second term on the right hand side is stochastic, we bound as follows: 

E[\\j:p_k^2nt\\l] = Bn,[En,[\\^p-kn2nt\\l\ni]] 



(J 

(c) 



E^i [l|Sp-fe|lFll^^ III] 

||S,_fc|||E[||Q+||2] 



p—l 

(a) follows from the law of iterated expectation; (5) follows from Lemma 19; (c) follows because 
is a constant; (d) follows from Lemma 20. We conclude the proof by noting that ||Sp_jt||F = 
||A-Afe||F. ■ 

We now have all the necessary ingredients to conclude the proof of Lemma 11. Let Y be the matrix 
of Lemma 26, and let Q be an orthonormal basis for its column space. Then, using the algorithm 
of Section 2.2 and applying Lemma 7 we can construct the matrix Q (Q^ A)^ such that 

||A-Q(QTA)J|| = ||A-n^,fe(A)||2. 

Clearly, Q (Q^A)^ is a rank k matrix; let Z G M"^*^ be the matrix containing the right singular 
vectors of Q (Q"'"A)^, so Q (Q"'"A)^ = XZ""". Note that Z is equal to the right singular vectors of 
the matrix (Q"'"A)^ (because Q has orthonormal columns), and thus Z has already been computed 
at the second step of the algorithm of Section 2.2. Since E = A - AZZ'^ and ||A - AZZ^||f < 
II A — XZ"'"||f for any X, we have 



|E||| < ||A - Q (QTa)^ III = ||A - n2._,(A^"2 



F- 



To conclude, take expectations on both sides, use Lemma 26 to bound E 
set p = [ I + 1 ] to obtain: 



|A-n2 (A)||2 



, and 



E [||E|||] < (^1 + II A - Afclll < (1 + e) II A - Afc|||, 

By construction, EZ = Omxk- The running time follows by adding the running time of the algorithm 
at the beginning of this section and the running time of the algorithm of Lemma 2.2. 
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Input: 

• V = {vi, . . . , v„}, with X;r=i ^i^i = Ijt < 

• U = {ui,..., Un}, with ^21=1 ^i^I = h n) 

• integer r, with k < r < n 

Output: A vector of weights s = [si, . . . , s„], with Sj > and at most r non-zero Sj's. 

1. Initiahze sq = 0„xi, Aq = Okxk, Bo = O^x^- 

2. For r = 0,...,r - 1 

• Compute Lj- and u,- from eqn. (6). 

• Find an index j in {!,...,«} such that 

U{Uj,Sy,Br,Vr) < L(Vj,5^,Ar,Lr). (3) 

• Let ^ 

= -{UiUj,dy,Br,Ur) + L{vj,6^,Ar,hr)) . (4) 

• Update the jth component of s, A,- and B,-: 

Sr+i[j] = Sr[j] + 1, Ar+i = + tv^vj, and B^+1 = + iu^uj. (5) 



3. Return s = r 




Algorithm 1: Deterministic Dual Set Spectral Sparsification. 

7 Dual Set Spectral Sparsification: proof of Lemma 12 

In this section, we prove Lemma 12, which generalizes Theorem 3.1 in [1]. Indeed, setting V = U 
reproduces the spectral sparsification result of Theorem 3.1 in [1]. We will provide a constructive 
proof of the lemma and we start by describing the algorithm that computes the weights Si, i = 
l,...,n. 

7.1 The Algorithm 

The fundamental idea underlying Algorithm 1 is the greedy selection of vectors that satisfy a 
number of desired properties in each step. These properties will eventually imply the eigenvalue 
bounds of Lemma 12. We start by defining several quantities that will be used in the description 
of the algorithm and its proof. First, fix two constants: 
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Lt and Ur as follows: 



Given k, i, and r (all inputs of Algorithm 1), and a parameter r = 0, . . . , r— 1, define two parameters 




<5u (r + V^) . (6) 



We next define the lower and upper functions ^(l, A) (l e M and A G M.'^^^) and ^(u, B) (u G M 
and B G M^^^) as follows: 

i=i *^ i=i ^ 

Let L(v, A, l) be a function with four inputs (a vector v G M*^^^, (5l G M, a matrix A G M*^^*^, 
and L G M): 

Similarly, let ?7(v,5u,B,u) be a function with four inputs (a vector u G M^^^, 5u € IR, a matrix 

B G R^""^, and U G M): 

(/)(u,B) - cl){v + 5i,,B) 

Algorithm 1 runs in r steps. The initial vector of weights Sq is initialized to the all-zero vector. 
At each step r = 0, . . . ,r — 1, the algorithm selects a pair of vectors (uj,Vj) that satisfy eqn. (3), 
computes the associated weight t from eqn. (4) , and updates two matrices and the vector of weights 
appropriately, as specified in eqn. (5). 

7.2 Running time 

The algorithm runs in r iterations. In each iteration, we evaluate the functions ?7(u, ^j^B,!;) 
and L(v, 5l, A,l) at most n times. Note that all n evaluations for both functions need at most 
0{k^+nk'^+l^+n£^) time, because the matrix inversions can be performed once for all n evaluations. 
Finally, the updating step needs an additional 0(/c^ + time. Overall, the complexity of the 
algorithm is of the order 0{r{k^ + n/c^ + + nl"^ + k"^ + (^)) = O [rn (/c^ + ^^)). 

Note that when U is the standard basis {U = {ei,...,e„} and I = n), the computations 
can be done much more efficiently: the eigenvalues of Bg need not be computed explicitly (the 
expensive step), since they are available by inspection, being equal to the weights s^-. In the 
function ?7(u, (5,;, B, u), the functions (f) (given the eigenvalues) need only be computed once per 
iteration, in 0(n) time. The remaining terms can be computed in 0(1) time, because, for example, 
e^((u + Sd)I — B)~^ei = (u + (5u — s[i])^^. The running time now drops to O (rnk"^), since all the 
operations on U only contribute 0{rn). 



7.3 Proof of Correctness 

We prove that the output of Algorithm 1 satisfies Lemma 12. Our proof is similar to the proof of 
Theorem 3.1 [1]. The main difference is that we need to accommodate two different sets of vectors. 
Let W G M"*^"* be a positive semi-definite matrix with eigendecomposition 

m 

W = 5^Ai(W)uiu7 

i=l 
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and recall the functions ^(l, W), ^(u, W), L(v, 5l, W, l), and C/(v, (5u, W, u) defined in eqns. (7), 
(8), and (9). We now quote two lemmas proven in [1] using the Sherman- Morrison- Woodbiiry 
identity; these lemmas allow one to control the smallest and largest eigenvalues of W under a 
rank-one perturbation. 

Lemma 27. Fix (5l > 0, W G M'"^"^, v G M™, and L < Am(W). If t > satisfies 

t-i<L(v,(5L,W,L), 

then XmiW + tvv'^) >L + Sj,. 

Lemma 28. Fix Sy>0,W e i?"*^'", v G R^, and u > Ai(W). Ift satisfies 

t-^ > ?7(v,5u,W,u), 

then Ai(W -|- ivv'^) <\J + 6jj. 

Now recall that Algorithm 1 runs in r steps. Initially, all n weights arc set to zero. Assume that 
at the r-th step (r = 0, . . . , r — 1) the vector of weights s,- = [s7-[l], . . . , ST-[n]] has been constructed 
and let 

n n 

Ar = ^ Sr[i]^ri^rJ and B^- = ^ Sr[i]uiuJ . 
1=1 i=l 

Note that both matrices A,- and B^- arc positive semi-definite. We claim the following lemma which 
guarantees that the algorithm is well-defined. The proof is deferred to the next subsection. 

Lemma 29. At the r-th step, for all t = 0, . . . ,r — 1, there exists an index j in {1, . . . ,n} such 
that setting the weight t > as in eqn. (4) satisfies 

U{uj,S^, B^, < t-^ < L{yj,S^, A^, l^. (10) 

Once an index j and a weight t > have been computed, Algorithm 1 updates the j-ih. weight 
in the vector of weights Sj- to create the vector of weights 8,-+!. Clearly, at each of the r steps, 
only one element of the vector of weights is updated. Since Sq is initialized to the all-zeros vector, 
after all r steps are completed, at most r weights are non-zero. The following lemma argues that 
Amiii(AT-) and Ainax(BT-) are bounded. 

Lemma 30. At the r-th step, for all t = 0, . . . ,r — 1, AminlAr) > W and Amax(B-r) < U-r. 

Proof. Recall eqn. (6) and observe that Lq = —Vrk < and Uq = SuVrl > 0. Thus, the lemma 
holds at r = 0. It is also easy to verify that L^+i = + Sj^, and, similarly, U^+i =Vr + Sd. Now, 
at the r-step, given an index j and a corresponding weight t > satisfying eqn. (10), Lemmas 27 
and 28 imply that 

Amin(AT-+i) = Ainin(Ar + tVjvJ) > W + = W+V, 



Amin(BT-+i) = Ainax(BT- -|- tUjuJ) <Vt+Sv = Vr+1- 



J 

The lemma now follows by simple induction on r. ■ 

We are now ready to conclude the proof of Lemma 12. By Lemma 30, at the r-th step, 

Amax(Br) < Uj. and Aniin(Ar) > L^. 

Recall the definitions of Uj. and from eqn. (6) and note that they are both positive and well-defined 
because r > k. Lemma 12 now follows after rescaling the vector of weights s by r~^ ^1 — ^Jk/r^ . 
Note that the rescaling does not change the number of non-zero elements of s, but does rescale all 
the eigenvalues of A^ and B^. 
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7.4 Proof of Lemma 29 

In order to prove Lemma 29 we will use the following averaging argument. 
Lemma 31. At any step t = 0, . . . ,r — 1, 

i=l ^ i=l 

Proof. For notational convenience, let (f)^ = 0(11,-, B,-) and let = ^(l,-, At-). At r = 0, Bq = 
and Aq = and thus 0o — ^/Uo and (j)Q = — /c/lq. Focus on the r-th step and assume that the 
algorithm has run correctly up to that point. Then, 0^ < 0g and (j)^ ^ ^o- Both are true at 
T = and, assuming that the algorithm has run correctly until the r-th step. Lemmas 27 and 28 
guarantee that 0^ and are non-increasing. 

First, consider the upper bound on U. In the following derivation, Aj denotes the i-ih eigenvalue 
of Br. Using Tr(u'^Xu) = Tr(Xuu'^) and = ^e, we get 

Vi7(u,,(5u,B„u,) = '^J^'+f ' ~ + 0(u,+i, B,) 



The last line follows because the last two terms are negative (using the fact that u^+i > U,- > Aj) 
and (f>r ^ ^0- Now, using = SvVrl and the definition of Su, the upper bound follows: 



In order to prove the lower bound on L we use a similar argument. Let Aj denote the z-th eigenvalue 
of A,-. Then, 

V-r. X A ^ TV [(A, - L,+iIfc)-2] 

E 



•^lE^I (A,-w,!)(A,:-w) i=l 



(Ai— LT+l)(Ai — Lr) 

V ' 

Assuming £^ > the claim follows immediately because 5l = 1 and ^ = —k/ho = k/Vrk = \fkjr. 
Thus, we only need to show that S > ^. Prom the Cauchy-Schwarz inequality, for aj,6j > 0, 
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(Ei aAf < afbi) (£i hi) and thus 
J' ^ 1 1 ^ 1 f ^ 1 \^ 

^ k k ^ k ^ 

~ 5l ^ (Ai -L^+l)2(Ai -L^) (Ai - W+i)2(Ai -L^) ^ Ai- w 

V<^L V ^ (Ai -Lr+l)2(Ai-L^) 

To conclude our proof, first note that S^^ — Si^^^ > S^^ — ^l^o = 1 ~ > (recall that r > k). 

Second, Aj > hr+i because 

11/7 

Amin(A^) > Lt- + — > + — = + W - > + 1 = L^+i. 

^0 V 

Combining these two observations with eqn. (11) we conclude that S >0. ■ 

Lemma 29 follows from Lemma 31 because the two inequalities must hold simultaneously for at 
least one index j. 

8 Dual-set Spectral-Frobenius Sparsification: proof of Lemma 13 

In this section we will provide a constructive proof of Lemma 13. Our proof closely follows the 
proof of Lemma 12, so we will only highlight the differences. We first discuss modifications to 
Algorithm 1. First of all, the new inputs are V = {vi, . . . , v„} and A = {ai, . . . , a„}. The output 
is a set of n non-negative weights Sj, at most r of which are non-zero. We define the parameters 



1-a/5 



for all r = 0, . . . , r — 1. Let Sr denote the vector of weights at the r-th step of Algorithm 1 and 
initialize sq and Aq as in Algorithm 1 (Bq will not be necessary). We now define the function 
Up (a, 5u), where a G and Sv e R: 

UF{a,6^) = S-^a^a. (12) 
Then, at the r-th step, the algorithm will pick an index j and compute a weight t > such that 

t^F(aj,5u) < t-^ < L(vj,,5l, A^,w). (13) 
The algorithm updates the vector of weights and the matrix 



Ar = ^ Sr.iVjVj^ 



i=l 

It is worth noting that the algorithm does not need to update the matrix 

n 



•Br — ^ ] ST-jiajaj , 



i=l 



because the function Up does not need Br as input. To prove the correctness of the algorithm we 
need the following two intermediate lemmas. 
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Lemma 32. At every step r = 0, ... ,r — 1 there exists an index j in {1, . . . ,n} that satisfies 
eqn. (13). 

Proof. The proof is very similar to the proof of Lemma 29 (via Lemma 31) so we only sketch the 
differences. First, note that the dynamics of L have not been changed and thus the lower bound for 
the average of L{yj, 5l, A,-, l,-) still holds. We only need to upper bound the average of UpiB-i, (5u) 
as in Lemma 31. Indeed, 

n n n 

^UF{ai,6xj) = S'^^ajai = S^^^\\ai\\l = 1 - y-, 

i=l 1=1 i=l 

where the last equality follows from the definition of S^,. ■ 

Lemma 33. Let W G R^^^ be a symmetric positive semi-definite matrix, let a E M.^ be a vector, 
and letv eR satisfy U > Tr(W) . Ift>0 satisfies 

UF{a,S^)<t-\ 

then Tr (W + tw^) <v + 5v. 

Proof. Using the conditions of the lemma and the definition of Up from eqn. (12), 

Tr(W + taa'^)-U-5u, = Tr(W) - U + ta'^a - (5u, 

< Tr(W)-U<0, 

which concludes the proof of the lemma. ■ 

We can now combine Lemmas 27 and 33 to prove that at all steps r = 0, . . . , r — 1, 

Amin(Ai-) > Lr and Tr(BT-) < u,-. 



Note that after all r steps are completed, = r ^1 — y/kjr^ and = r ^1 — y/kjr^ SiLi 11^ 111- 
A simple rescaling now concludes the proof. The running time of the (modified) Algorithm 1 is 
O {nrk'^ -\-n£), where the latter term emerges from the need to compute the function ?7F(aj,(5u) 
for all _7 = 1, . . . , n once throughout the algorithm. 

9 Lower bounds 

9.1 Spectral Norm Approximation 

Theorem 34. For any a > 0, any k > 1, and any r > 1, there exists a matrix A G R"*^" for 
which 

||A-CC+A||2 ^ n + Q2 



> 



||A — Afc||2 r + a^' 

Here C is any matrix that consists of r columns of A. As a ^ 0, the lower bound is n/r for the 
approximation ratio of spectral norm column-based matrix reconstruction. 

Proof. We extend the lower bound in [6] to arbitrary r > k. Consider the matrix 

A = [ei + ae2, ei + aes, . . . , ei + ae„+i] € 
where e^ G M'^^^ are the standard basis vectors. Then, 

A'^A = + a^I„, o"i(A) = n + a^, and o^f (A) = for i > 1. 
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Thus, for all A; > 1, ||A — A^Hl = a^. Intuitively, as a — >■ 0, A is a rank-one matrix. Consider any r 
columns of A and note that, up to row permutations, all sets of r columns of A arc equivalent. So, 
without loss of generality, let C consist of the first r columns of A. We now compute the optimal 
reconstruction of A from C as follows: let be the j-th column of A. In order to reconstruct a^, 
we minimize \\a.j — Cx||| over all vectors x G M''. Note that if j < r then the reconstruction error 
is zero. For j > r, a.j = 61+ aSj+i, 

r r 

Cx = ei ^ + a ^ XiBi+i. 



1=1 1=1 



Then, 



a,- 



Cx||2 = \\eii^Xi-l\ +a^Xiei+i-ej+i\\l 

\i=l ) i=\ 

/ r \ 2 r 



The above quadratic form in x is minimized when xi = (r + a^) for all i = l,...,r. Let 

ij. Then, for j < r, slj 



A = A — CC^A and let the j-th column of A be a^. Then, for j < r, a.j is an all-zeros vector; for 



j > r, = asj+i - -f-j ^i+i- Thus, 



, T ' 

A A 



Orxr Orx(n— r) 
P{n—r)xr ^ 



where 



^ — , 9 ^n—r-^n—r ' ^n—r- 
r + a'^ 



This immediately implies that 

II A 11? = II a'^ A IL = llyJI? ^ i ,^ _ 

r + r + 



A nn+AI|2 ||AI|2 HAWAII Il7l|2 - r)a'^ , „2 !1±^^2 
A — t-^C AII2 = II AII2 = II A A||2 = 11^112 = : o r <3; = — ; . 



This concludes our proof, because 

a^ = \\A-Ak\\l 



9.2 Frobenius norm approximation 

Note that a lower bound for the ratio 

||A-4,,(A)|||/||A-A,|||, 
does not imply a lower bound for the ratio 

||A-CC+A|||/||A-Afc|||, 

because 

||A - CC+A|||/||A - A,||| < ||A - 4_,(A)|||/||A - Afe|||. 
Also, note that Proposition 4 in [8] shows a lower bound equal to (1 -|- k/2r) for the ratio 

||A-ng,(A)|||/||A-Afc|||. 
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For completeness, we extend the bound of [8] to the ratio 



||A-CC+A|||/||A- Afclll. 

In fact, we obtain a lower bound which is asymptotically 1 + k/r. We start with the following 
lemma. 

Lemma 35. For any a > and r >1, there exists a matrix A € R"*^" for which 

IIA-CC+AIII n-r f 1 
> r 1 + 



||A — Ailll n — 1\ r + a^^ 
Proof. We employe the same matrix A as in Theorem 34. So, it follows that 

||A - CC+Alll = Tr(Z) = a^{n - r){l + —^), 

r + a'^ 

and II A — Ai||p = (n — l)a^, which concludes the proof. ■ 

Theorem 36. For any a > 0, any k > 1, and any r > 1, there exists a matrix B € W^^^ for 
which 

IIB-CC+BIII n-r / k 
> r 1 + 



||B — Bjtlll n — k\ T -\- o? 

Here C is any matrix that consists of r columns o/B. As a ^ and n — t- oo the lower bound is 
1 + (k/r) for the approximation ratio of Frobenius norm column-based matrix reconstruction. 

Proof. The matrix B is constructed as follows. Let A has dimensions ^^-^ x ^ and is constructed 
as in Theorem 34. B contains k copies of A across the main diagonal. We sample r columns in 
total, with ri columns from each block. Lemma 35 holds for each block, with n and r replaced 
hj n/k and r,. Let be the number of columns selected in each block and let r = Yli=i """i- We 
analyze the Frobenius norm error in each block independently. Let Zj be the error matrix in each 
block, as in the proof of Theorem 34. Then, using Lemma 35, the approximation error is equal to 

IIB-CC-BIII = X:T.(Z.) = „^X:g-,,)(l + -^). 

1=1 1=1 ^ ' 

Minimizing this expression subject to the constraint that = r gives ri = r/k. The result 

follows after some algebra using ||B — Bfc||| = (ra — fc)a^. ■ 



10 Open Problems 

Several interesting questions remain unanswered; we highlight two. First, is it possible to improve 
the running time of the deterministic algorithms of Lemmas 12 and 13? Recently, Zouzias [25] 
made progress in improving the running time of the spectral sparsification result of [1]; can we 
get a similar improvement for the 2-set algorithms presented here? Second, in the parlance of 
Theorem 5, is there a deterministic algorithm that selects 0{k/e) columns from A and guarantees 
relative-error accuracy for the error ||A — 11^ ^(A)|||? In a very recent development, [18] partially 
answers this question by extending the volume sampling approach of [6] to deterministically select 
|(1 + o(l)) columns and obtain a relative error bound for the term || A — CC'''A||p. Notice that it 
is not obvious if [18] implies a similar deterministic bound for the error ||A — 11^ j;.(A)||p. 
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